The purpose of the following report is the analysis of the ‘global shark attacks’ dataset. The dataset was retrieved from http://www.sharkattackfile.net/ on 29.03.2020 and contains current and historical data on shark/human interactions. Our goal is to better understand the behavior of the sharks and to test a few hypotheses.
library(readxl)
shark <- read_excel("GSAF5.xls")
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in W1678 / R1678C23: got 'stopped here'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in X4619 / R4619C24: got 'Teramo'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in X6045 / R6045C24: got 'change filename'
## New names:
## * `Case Number` -> `Case Number...1`
## * `Case Number` -> `Case Number...20`
## * `Case Number` -> `Case Number...21`
## * `` -> ...23
## * `` -> ...24
View(shark[1:20,])
str(shark)
## Classes 'tbl_df', 'tbl' and 'data.frame': 25775 obs. of 24 variables:
## $ Case Number...1 : chr "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
## $ Date : chr "05-Feb-2020" "Reported 30-Jan-2020" "17-Jan-2020" "16-Jan-2020" ...
## $ Year : chr "2020" "2020" "2020" "2020" ...
## $ Type : chr "Unprovoked" "Provoked" "Unprovoked" "Unprovoked" ...
## $ Country : chr "USA" "BAHAMAS" "AUSTRALIA" "NEW ZEALAND" ...
## $ Area : chr "Maui" "Exumas" "New South Wales" "Southland" ...
## $ Location : chr NA NA "Windang Beach" "Oreti Beach" ...
## $ Activity : chr "Stand-Up Paddle boarding" "Floating" "Surfing" "Surfing" ...
## $ Name : chr NA "Ana Bruna Avila" "Will Schroeter" "Jordan King" ...
## $ Sex : chr NA "F" "M" "F" ...
## $ Age : chr NA "24" "59" "13" ...
## $ Injury : chr "No injury, but paddleboard bitten" "PROVOKED INCIDENT Scratches to left wrist" "Laceration ot left ankle and foot" "Minor injury to lower leg" ...
## $ Fatal (Y/N) : chr "N" "N" "N" "N" ...
## $ Time : chr "09h40" NA "08h00" "20h30" ...
## $ Species : chr "Tiger shark" NA "\"A small shark\"" "Broadnose seven gill shark?" ...
## $ Investigator or Source: chr "K. McMurray, TrackingSharks.com" "K. McMurray, TrackingSharks.com" "B. Myatt & M. Michaelson, GSAF; K. McMurray, TrackingSharks.com" "K. McMurray, TrackingSharks.com" ...
## $ pdf : chr "2020.02.05.Maui.pdf" "2020.01.30.R-Avila.pdf" "2020.01.17-Schroeter.pdf" "2020.01.16-King.pdf" ...
## $ href formula : chr "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.02.05.Maui.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.30.R-Avila.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.17-Schroeter.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.16-King.pdf" ...
## $ href : chr "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.02.05.Maui.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.30.R-Avila.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.17-Schroeter.pdf" "http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.16-King.pdf" ...
## $ Case Number...20 : chr "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
## $ Case Number...21 : chr "2020.02.05" "2020.01.30.R" "2020.01.17" "2020.01.16" ...
## $ original order : chr "6506" "6505" "6504" "6503" ...
## $ ...23 : logi NA NA NA NA NA NA ...
## $ ...24 : logi NA NA NA NA NA NA ...
First we will investigate, clean and prepare every single variable in the dataset. This is an important step because we want to see what kind of data we are dealing with and prepare it accordingly for further analysis.